Self-Supervised Monocular Image Depth Learning and Confidence Estimation
نویسندگان
چکیده
Convolutional Neural Networks (CNNs) need large amounts of data with ground truth annotation, which is a challenging problem that has limited the development and fast deployment of CNNs for many computer vision tasks. We propose a novel framework for depth estimation from monocular images with corresponding confidence in a selfsupervised manner. A fully differential patch-based cost function is proposed by using the Zero-Mean Normalized Cross Correlation (ZNCC) that takes multi-scale patches as a matching strategy. This approach greatly increases the accuracy and robustness of the depth learning. In addition, the proposed patch-based cost function can provide a 0 to 1 confidence, which is then used to supervise the training of a parallel network for confidence map learning and estimation. Evaluation on KITTI dataset shows that our method outperforms the state-of-the-art results.
منابع مشابه
Fusion of stereo and still monocular depth estimates in a self-supervised learning context
We study how autonomous robots can learn by themselves to improve their depth estimation capability. In particular, we investigate a self-supervised learning setup in which stereo vision depth estimates serve as targets for a convolutional neural network (CNN) that transforms a single still image to a dense depth map. After training, the stereo and mono estimates are fused with a novel fusion m...
متن کاملOn the Importance of Stereo for Accurate Depth Estimation: An Efficient Semi-Supervised Deep Neural Network Approach
We revisit the problem of visual depth estimation in the context of autonomous vehicles. Despite the progress on monocular depth estimation in recent years, we show that the gap between monocular and stereo depth accuracy remains large—a particularly relevant result due to the prevalent reliance upon monocular cameras by vehicles that are expected to be self-driving. We argue that the challenge...
متن کاملLearning Depth from Single Monocular Images
We consider the task of depth estimation from a single monocular image. We take a supervised learning approach to this problem, in which we begin by collecting a training set of monocular images (of unstructured outdoor environments which include forests, trees, buildings, etc.) and their corresponding ground-truth depthmaps. Then, we apply supervised learning to predict the depthmap as a funct...
متن کاملSelf-Supervised Siamese Learning on Stereo Image Pairs for Depth Estimation in Robotic Surgery
INTRODUCTION Robotic surgery has become a powerful tool for performing minimally invasive procedures, providing advantages in dexterity, precision, and 3D vision, over traditional surgery. One popular robotic system is the da Vinci surgical platform, which allows preoperative information to be incorporated into live procedures using Augmented Reality (AR). Scene depth estimation is a prerequisi...
متن کاملConditional Models for 3d Human Pose Estimation
OF THE DISSERTATION Conditional Models for 3D Human Pose Estimation by ATUL KANAUJIA Dissertation Director: Dimitris Metaxas Human 3d pose estimation from monocular sequence is a challenging problem, owing to highly articulated structure of human body, varied anthropometry, self occlusion, depth ambiguities and large variability in the appearance and background in which humans may appear. Conve...
متن کامل